Information Visualization: Second Practical Work¶
Authors:¶
- David Gallardo
- Pau Amargant
Introduction¶
The goal of this project is to further analyze the New York traffic accident dataset, focusing on the use of interactive visualizations. The goal has been to create an interactive visualization with multiple views that allows users to answer the following questions:
- Which weather condition and type of vehicle were present in the majority of accidents each month? And in the combination of all the months?
- In which area and at what hour did the majority of accidents each month happen? And in the combination of all the months?
- Which area presented the majority of taxi accidents during rainy days in June on Mondays at noon, 12am?
- Which day had more accidents during clear days in July in Manhattan? With this goal in mind, we have created a multiview visualization that shows the user the information required to answer the previous questions. The visualization gives the user the ability to filter and select the data in order to answer more specific questions.
To be able to run this notebook, please install the required libraries by running the following cell. In case issues arise, please refer to the README.md file.
!pip install -r requirements.txt
We import the necessary libraries and our custom-made functions, stored in the graphs.py file.
from graphs import *
Preprocessing¶
To begin with, we describe how we have preprocessed the data. Most of the process is the same as in the first practical work. First of all, the data was preprocessed using OpenRefine, which is a tool that allows us to easily clean and transform the data. The main steps which were performed are the following:
- Normalization of vehicle types names; in this case the focus was on the three vehicle types which have been the focus of our analysis: taxis, fire trucks and ambulances. The normalization consisted in grouping the different names that were used to refer to the same vehicle type.
- Selecting the relevant columns for our analysis and setting the proper data types.
- Selecting the accidents which took place in the time interval we are interested in (jun-sep 2018)
The dataset was exported as a CSV file and further processed using Python. The main steps performed are the following:
- Date variables were converted to datetime objects.
- Auxiliary column were created to store the day of the week, the hour of the day, and the month of the accident.
- The coordinates of the accidents were converted to geometry format and used to obtain the borough of the accident, as in the original dataset this column contained many missing values.
It is worth noting that we followed the instructions in regard to which vehicle types to select as fire trucks, Ambulances and Taxis. Nonetheless, some additional vehicle types might have also been assigned as either of the three because of extensive preprocessing that had been performed during the first practical work.
Furthermore, not all the columns in the dataset have been used. Only the relevant ones have been selected. In the visualization making process only the strictly necessary columns are passed to Altair in order to avoid unnecessary computations. However, due to the large amount of interactions used in the visualizations, a lot of columns have to be passed because of the filtering that takes place.
data=get_accident_data(fname='dataset_v1.csv') # this function processes the data and returns the dataframe
accident_data = get_weather_data(data) # this function processes the data and returns the dataframe which also includes weather data
c:\Users\pamar\anaconda3\envs\VI\Lib\site-packages\IPython\core\interactiveshell.py:3526: FutureWarning: The `op` parameter is deprecated and will be removed in a future release. Please use the `predicate` parameter instead. exec(code_obj, self.user_global_ns, self.user_ns)
Design Process¶
Given the interactive and multiview nature of the visualization, the design process has been iterative and has included all the views of the visualization. From the start, we have tried to keep the design as simple as possible, avoiding unnecessary elements and focusing on clearly answering the questions. With this objective in mind, we began by sketching the overall design of the visualization, which is shown in the following image. This first sketch focused on deciding how we could use the views which had been created for the first course assignment to answer the questions. The main idea was to use the map to show the location of the accidents (as dots) and the amount per borough through a choropleth. Furthermore, a bar chart included the accidents per vehicle type. The time series would be used to show the accidents per hour, and a bar chart would show the accidents per month. Finally, a lollipop chart would show the difference with respect to the mean of accidents per day depending on the weather condition. The view created for te first assignment is also included, to facilitate understanding the design.
Through interactions, such as being able to select bars, points and boroughs in the map, and an interval of time, we felt that we could answer some but not all the questions. However, we felt that before further refining the design, we should implement the basics of the visualization in order to validate weather, from a technical point of view, the design was feasible. It is worth noting that we knew that we would have to change the design, as we were not able to answer all the questions with the views we had created, especially the ones which involved selecting specific days of the week and weeks in a month.
First Prototypes¶
We began by implementing the visualizations independently of each other, having only interactions with themselves. This allowed us to more easily debug the problems and check the viability of the vis, as both Altair and Streamlit have known bugs and issues which limit some possibilities.
In the following sections, the design process for each view (without taking into account inter-view interactions) is described, together with the final design and analysis of the view.
Map¶
We began by implementing the interactive map visualization. Our objectives were to create a visualization which used a choropleth to show the number of accidents per borough (using a mark_geoshape which encoded the count as color) and the position of each accident (using a mark_point encoding the coordinates) superposed on top of the choropleth. Furthermore, we wanted to be able to select a borough and a group of points, which would highlight them and update the other views so that they only showed the data corresponding to the selection.
However, we encountered several problems caused by the inner workings of Altair and an issue that Streamlit and Altair have with geodataframes. It is a known issue (GitHub issue #1002) that when rendering an Altair map chart in streamlit, a remote data source must be used and one can not use a geodataframe. This limits the ability of make an interactive choropleth due to various reasons:
- If the accident dataset is used as the data source and the geometry is looked up, streamlit gives an error due to the known bug.
- If the geometry is used as the data source, and the accident data is looked up it does not work properly as the lookup does not perform an inner join but is a one-sided join.
Therefore, it was decided to instead use a map containing the boroughs as a base layer, without encoding the count as color, and the locations of the accidents superposed over it. Furthermore, it was decided that the following interactions would be implemented:
- When one clicks one or more boroughs, the other borough's opacity is reduced to 0.2 and only those accidents which happened in the selected boroughs are shown.
- When one selects an area of the map, the selected accidents are highlighted and the other views are updated to only show the data corresponding to the selection.
Furthermore, in order to facilitate knowing the exact number of accidents in each borough, a bar chart was added to the map view. This bar chart shows the number of accidents per borough and is updated when a selection is made. It is worth noting that we did not use an interactive tooltip over the chart as it was not possible due to technical limitations.
The visualization is shown in the following cell:
# We get the map from a remote source
ny = "https://raw.githubusercontent.com/pauamargant/VI_P1/main/resources/new-york-city-boroughs.geojson"
data_geojson_remote = alt.Data(
url=ny, format=alt.DataFormat(property="features", type="json")
)
# Selector of field name (borough name)
selection_buro = alt.selection_point(fields=["name"])
# We define the width of the visualization and the ratio between the map and the bar chart
w = 800
h = 500
ratio = 0.8
# We also add a selection for the accidents, which allows the user to select an area of the map and see the accidents in that area
selection_acc_map = alt.selection_interval(fields=["LONGITUDE", "LATITUDE"])
# We define the base map, which uses mark_geoshape and its boroughs can be clicked to select them
base = (
alt.Chart(data_geojson_remote)
.mark_geoshape(fill="lightgrey", stroke="white")
.properties(
width=500,
height=300,
)
.project(type="albersUsa")
.encode(
opacity=alt.condition(selection_buro, alt.value(0.6), alt.value(0.2)),
tooltip=[alt.Tooltip("name:N", title="Borough")],
)
.properties(width=w * ratio, height=h)
.add_selection(selection_acc_map)
)
# We create the points corresponding to the accidents
points = (
alt.Chart(accident_data)
.mark_circle()
.encode(
longitude="LONGITUDE:Q",
latitude="LATITUDE:Q",
size=alt.value(2),
color=alt.Color(
"name:N",
legend=alt.Legend(title="Borough", orient="top-left"),
).scale(
# scheme="category20c"
range=["#66c2a5", "#fc8d62", "#8da0cb", "#e78ac3", "#a6d854"]
),
opacity=alt.condition(
selection_buro & selection_acc_map, alt.value(1), alt.value(0)
),
tooltip=alt.value(None),
)
)
# We superpose the points on the map using mark_circle. It uses points which encode the latitude and longitude using the spatial
# channel, which is powerful and appropiate for this kind of data. We also use the selection from the base map to highlight the
# accidents of the selected boroughs.
# We define the bar chart, which uses mark_bar and encodes the number of accidents per borough. It also uses the selection from the
# base map to highlight the accidents of the selected boroughs.
bar_chart = (
alt.Chart(accident_data)
.mark_bar()
.transform_filter(
selection_acc_map
)
.encode(
x=alt.X("count()", axis=alt.Axis(title=None)),
y=alt.Y("name:N", axis=alt.Axis(title="Boroughs")).sort("-x"),
opacity=alt.condition(selection_buro, alt.value(1), alt.value(0.4)),
color=alt.Color("name:N", legend=None).scale(
range=["#66c2a5", "#fc8d62", "#8da0cb", "#e78ac3", "#a6d854"]
),
tooltip=[
alt.Tooltip("count()", title="No. accidents"),
alt.Tooltip("name:N", title="Borough"),
],
)
.properties(width=w * 0.3, height=h, title="Accidents by Borough")
)
# We create the final view, which superposes the base map and the points. We also add the bar chart to the right of the map.
((base + points) | bar_chart).resolve_scale(color="shared").add_params(selection_buro)
c:\Users\pamar\anaconda3\envs\VI\Lib\site-packages\altair\utils\deprecation.py:65: AltairDeprecationWarning: 'add_selection' is deprecated. Use 'add_params' instead. warnings.warn(message, AltairDeprecationWarning, stacklevel=1)
In the previous cell the implementation of the chart to answer those questions related to the location of the accidents is shown. In the code comments the details about the implementation are explained. Nevertheless, we will explain here further details about the design decisions. (We will not explain the details about inter-view interactions, which are explained in later sections)
To begin with, it is composed of two views which are juxtaposed, using different encodings for the same data (which is altered with the interactions). Furthermore, when it comes to the interactions, the visualization uses coordinated views as both view are linked. It uses linked highlighting, as when one selects (and highlights) a borough one either view, the action is propagated to the other view. Technically (even though it is not strictly implemented this way in altair), we could consider that the opacity channel is shared between the two views.
Furthermore, the ability of selecting an area in the map and only those points being shown in the bar chart could also be considered as overview+detail (with a subset of the data, using different encodings), as the bar chart gives further detail about how the selected points are distributed among the boroughs. It is worth noting, that when one selects an area in the map, one can see the area that has been selected but at the same time the map still shows all the points, as it might be needed if the user decides to change the selected area. When it comes to the bar chart, the bar chart only takes into account and shows the accidents which are located in the selected area.
When it comes to how the color of the boroughs (area marks) and the bar charts has been encoded, a categorical color map has been used. The bars and the area marks share the same color map encoding to facilitate the user the task of relating a bar with a borough in the map.
It is also worth noting that the map view uses the AlbersUSA projection. It is an equal-area conic projection which is commonly used when displaying maps of the united states. It is a good choice as it preserves relative area and, most importantly, it is the official standard projection used by many US government agencies and as such it is the one that most people are used to seeing.
Vehicle type chart¶
Visualizing the frequency of accidents involving different vehicle types proved to be a straightforward task, as we focused on only three categories: Taxi, Firetruck, and Ambulance.
To represent this data, we chose a simple horizontal bar chart that effectively communicates the number of accidents associated with each vehicle type. To enhance label readability, we employed horizontal bars and included only vertical gridlines to help readers discern the precise number of accidents for each vehicle.
Recognizing the limited scope of our data (only three pairs of values), we prioritized space efficiency. Consequently, we integrated icons adjacent to each bar, providing readers with a quick visual clue of the vehicle type without compromising space. Notably, we refrained from using color to encode any variables, opting for a minimalist approach that prioritizes simplicity.
Given the skewed distribution of our data, it's noteworthy that the Taxi category consistently contributed significantly more samples than the others.
Regarding interactivity, our chart allows users to filter the data based on the specific vehicle type involved in each accident. When a bar is clicked on it is highlighted by modifying the other's opacity.
selection_vehicle = alt.selection_point(on="click", fields=["VEHICLE TYPE CODE 1"])
bar_chart = (alt.Chart(accident_data).mark_bar().encode(
y=alt.Y("VEHICLE TYPE CODE 1:N", title=None,).sort("x"),
x=alt.X("count()",title="Number of accidents"),
tooltip=["VEHICLE TYPE CODE 1", "count()"],
opacity=alt.condition(selection_vehicle, alt.value(1), alt.value(0.2)),
).add_params(selection_vehicle).properties(width=600, height=300)
)
text_emoji = (bar_chart.mark_text(align="left",baseline="middle",fontSize=40,dx=0).encode(text=alt.Text("emoji:N"))
.transform_calculate(emoji="{'AMBULANCE':'🚑','FIRE':'🚒','TAXI':'🚕'}[datum['VEHICLE TYPE CODE 1']]")
)
bar_chart + text_emoji
Weather condition chart¶
In order to showcase the effect different weather conditions had on the number of accidents we decided to create an interactive heatmap. Initially, we had implemented a bar chart with negative values which showed the effect of each condition on the average number of accidents of a certain day. We ended up discarding the idea because we believed it made the view unnecessarily complex and did not contribute enough to the overall information asked on the questions.
The use of color in a heatmap as an encoding mechanism may not be the most accurate portrayal of exact quantities, but it provides an immediate visual understanding of the accident counts of each weather condition. To maintain a streamlined design, we abstained from incorporating additional visual elements such as text or numbers, relying solely on the color gradient to convey information. Exact quantities can be obtained by using the tooltip. Another reason we settled on this design is due to the fact that we wanted it to function as a makeshift selection menu to select weather conditions to filter the rest of the data. The user is able to select one or more weather conditions in order to search their data. Still, we believe in the standalone functionality of the chart and its ability to convey an overview on the data.
custom_sort = ['Clear', 'Partially cloudy', 'Overcast', 'Rain', 'Rain, Overcast', 'Rain, Partially cloudy']
selection_cond = alt.selection_point(on="click", fields=["conditions"])
alt.Chart(accident_data).mark_rect().encode(
y=alt.Y("conditions:N", sort= custom_sort, axis=alt.Axis(title=None)),
color=alt.Color("count()", legend=alt.Legend(title="No. accidents"), scale=alt.Scale(scheme="blues")),
opacity=alt.condition(selection_cond, alt.value(1), alt.value(0.2)),
tooltip=[alt.Tooltip("count()", title="No. accidents"), alt.Tooltip("conditions:N", title="Weather")],
).properties(width=150, height=450, title="Weather conditions").add_params(selection_cond)
Time of day chart¶
We wanted to create a view which allowed the user to observe the distribution of accidents through the day and across the days of the week. We started off with two independent charts similar to our time of day chart from Project 1, encoding accidents by hour and by day of the week, respectively. We quickly realized it made sense to merge both charts into a single unified view, combining two bar charts into a heatmap. A heatmap proved more than sufficient for comparing quantities and distributions in this context. Nevertheless, we acknowledged that we still were interested in showing the distributions of accidents by hour and by day of the week. Consequently, we opted to implement them in the form of bar charts, serving as a type of "extension" to the heatmap. These act as conditional distributions, revealing, for each row or column of the heatmap, the total number of accidents.
Furthermore, the reader can select a specific hour and day of the week, and the chart will highlight its corresponding hour and day bars.
From a more theoretical point of view, this graph consists of coordinates views with linked highlighting. The x-axis of the heatmap is shared with the top bar-chart and the y-axis of the heatmap is shared with the weekday bar-chart. The carts encode the same data, using different encodings but with a different granularity level.
selection_weekday = alt.selection_point(fields=["dayname"])
time_brush = alt.selection_point(fields=["HOUR"])
custom_sort = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
h = 500
w = 800
h1 = int(2 * h / 3)
h2 = int(1 * h / 3)
w1 = int(3 * w / 4)
w2 = int(w / 4)
h1 = int(2 * h / 3)
base = (
alt.Chart()
.mark_rect()
.encode(
x=alt.X("HOUR:O", axis=alt.Axis(title="Hour of the day", labelAngle=0)),
y=alt.Y("dayname:O", sort=custom_sort, axis=alt.Axis(title=None)),
)
)
times_of_day = (
base.mark_rect(stroke="grey")
.encode(
color=alt.Color(
"count()",
scale=alt.Scale(scheme="tealblues"),
legend=alt.Legend(title="Number of accidents"),
),
opacity=alt.condition(
time_brush & selection_weekday, alt.value(1), alt.value(0.2)
),
tooltip=[
alt.Tooltip("count()", title="No. accidents"),
alt.Tooltip("HOUR:O", title="Hour"),
alt.Tooltip("dayname:O", title="Day"),
],
)
.properties(width=w1, height=h1)
.add_params(time_brush, selection_weekday)
)
hour_bar = (
alt.Chart()
.mark_bar()
.encode(
y=alt.Y(
"count()",
scale=alt.Scale(reverse=False),
axis=alt.Axis(title="No. accidents"),
),
x=alt.X("HOUR:O", axis=alt.Axis(title=None, labelAngle=0, orient="top")),
opacity=alt.condition(time_brush, alt.value(1), alt.value(0.2)),
tooltip=[
alt.Tooltip("count()", title="No. accidents"),
alt.Tooltip("HOUR:O", title="Hour"),
],
)
.properties(width=int(w1), height=h2)
.add_params(time_brush)
)
weekday_bar = (
alt.Chart()
.mark_bar()
.encode(
x=alt.X("count()", axis=alt.Axis(title="No. accidents")),
y=alt.Y("dayname:O", axis=None, sort=custom_sort),
opacity=alt.condition(selection_weekday, alt.value(1), alt.value(0.2)),
tooltip=[
alt.Tooltip("count()", title="No. accidents"),
alt.Tooltip("dayname:O", title="Day"),
],
)
.properties(width=w2, height=h1)
.add_params(selection_weekday)
)
time_chart = alt.vconcat(
hour_bar,
alt.hconcat(times_of_day, weekday_bar).resolve_scale(y="shared"),
data=accident_data,
).resolve_scale(color="shared", x="shared")
time_chart
Counter chart¶
Upon reviewing our dataset, we discovered information regarding the health condition of the individuals involved in the accidents. We believed it would be interesting to also show this data. We decided to create a barplot which acted as a counter of the total number of accidents, while also making the distinction between accidents which ended up with injuries and those that did not. Initially we also included the distinction of fatalities, however given that very few accidents resulted in death, we opted not to include them. Even though it is a bar-plot, no actual data is encoded in the barplot and it is only used as a counter, with the bar being used for aesthetic reasons. The count is encoded using text.
Afterwards, we realized it would be of great interest to consistenly referencing the overall number of accidents within the specified filtering conditions. Due to the multitude of variables availeable for user exploration, it is possible to lose perspective of the amount of crashes considered. This approach enables the reader to interact freely with any charts, while ensuring awareness of the scope of the filtered conditions. Additionally, it is possible to filter accidents with and without injuries.
selection_injured = alt.selection_point(fields=["INJURED"])
w = 80
h = 75
total_chart = (alt.Chart(accident_data).mark_bar(cornerRadius=10).encode(
tooltip=[alt.Tooltip("count()", title="No. accidents")],
)
)
total_text = (
total_chart.mark_text(align="center", baseline="middle", fontSize=15, dx=0, fontWeight="bold",)
.encode(
text=alt.Text("count()"),
color=alt.value("black"),
x=alt.X().axis(labels=False),
).properties(width=int(w), height=int(h), title="Total accidents")
)
selected_text = (total_chart.transform_filter(selection_injured)
.mark_text(
align="center",
baseline="middle",
fontSize=15,
dx=0,
fontWeight="bold",
)
.encode(
text=alt.Text("count()"),
color=alt.value("black"),
x=alt.X().axis(labels=False),
).properties(width=int(w), height=int(h), title="Currently selected")
)
injured_chart = (alt.Chart(accident_data).mark_bar(cornerRadius=10)
.encode(
opacity=alt.condition(selection_injured, alt.value(1), alt.value(0.2)),
tooltip=[alt.Tooltip("count()", title="No. accidents")],
y=alt.Y("INJURED:N", title=None, axis=alt.Axis(ticks=False)),
)
)
injured_text = (injured_chart.mark_text(
align="center",
baseline="middle",
fontSize=15,
dx=0,
fontWeight="bold",
)
.encode(
text=alt.Text("count()"),
color=alt.value("black"),
).properties(width=int(w), height=int(h * 1.5))
)
injured_chart = (injured_chart + injured_text).add_params(selection_injured)
((total_chart + total_text) & (total_chart + selected_text) & injured_chart).configure_axis(grid=False).configure_view(stroke=None)
Cause of accident chart¶
Even though it is not one of the questions that we have to answer, we have decided to include a chart that shows the cause of the accidents. This view is a bar chart that shows the number of accidents per cause. Through a window transformation the 10 most common accident causes (for the vehicle 1 in the accident) are selected and shown. The vehicle type and count are encoded using the spatial channel and area. We decided to only display the top 10 because some of the causes are very infrequent.
We also had the idea of showing the number of accidents per vehicle type. Nevertheless, the three vehicle types that we studied are highly unbalanced in term of total number of accidents, therefore even though we tried encoding the vehicle type as color (which meant that it now was a stacked bar chart), the chart was not very useful as the bars were very small and it was difficult to compare them. Therefore, we decided to not include it in the final visualization. Instead, if one wants to know the number of accidents per vehicle type, one can use the vehicle type chart to select the vehicle type and then this view is updated.
selection_acc_factor = alt.selection_point(fields=["CONTRIBUTING FACTOR VEHICLE 1"])
alt.Chart(accident_data).mark_bar().encode(
x=alt.X(
"CONTRIBUTING FACTOR VEHICLE 1:N",
axis=alt.Axis(title="Contributing Factor"),
).sort("-y"),
y=alt.Y("count:Q", axis=alt.Axis(title="Count")),
opacity=alt.condition(selection_acc_factor, alt.value(1), alt.value(0.2)),
).transform_aggregate(count="count()", groupby=["CONTRIBUTING FACTOR VEHICLE 1"]
).transform_window(
window=[{"op": "rank", "as": "rank"}],
sort=[{"field": "count", "order": "descending"}],
).transform_filter("datum.rank <= 10"
).add_params(selection_acc_factor
).properties(title="Top 10 Contributing Factors")
Calendar Chart¶
After having created the first prototype of the visualization, we realized that we did not provide the user enough detail about the temporal (day/month) distribution of the accidents and that the user could not select and visualize the accidents of a specific weekday/week/month easily. Therefore, we decided to add a chart that would act both as a calendar, giving the user the ability to select the weeks/months/weekdays in which they are interested and, at the same time, act as a heatmap and show the number of accidents in each of the days of the 2018 summer. The objective of this view, was also to avoid overloading the user with dropdown menus, which are not very visually pleasing.
The chart consists of a heatmap in which, using the spatial channel, the day of the week and week within each of the different months are shown. The color of each individual day is encoding the number of accidents during said day using a single hue color scale. Furthermore, the chart allows the user to select a specific which to filter only the data corresponding to said week. This is shown to the user by a change in the opacity of said week. Furthermore, as some days have very similar values a tooltip allows to see the exact number of accidents in each day.
On the right side of the graph, a heatmap (which is technically a bar chart due to issues with altair) encodes the average number of accidents per month, using the same color scale as the calendar char. This view too has interactivity, allowing to select a specific month by clicking on it. This selection is shown to the user by a change in the opacity of the month. Both views show the same data using the same encoding but with different aggregation level, as the calendar has level of detail of a day and the bar chart has a level of detail of a month.
When it comes to how the visualization performs from a perception point of view, the use of a single hue color scale allows to easily distinguish the accidents per day and know which days have a higher amount. The use of the familiar calendar structure facilitates understanding visualization while at the same time the month structure uses containment.
order = [
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Saturday",
"Sunday",
]
month_order = ["June", "July", "August", "September"]
selection_month = alt.selection_point(fields=["monthname"])
selection_week= alt.selection_point(fields=["week"])
calendars = (
alt.Chart(accident_data)
.transform_filter(
selection_month
)
.mark_rect()
.encode(
row=alt.Row("monthname:O", sort=month_order, spacing=0, title=None),
x=alt.X("dayname:O", sort=order, axis=alt.Axis(title=None)),
y=alt.Y("week:O", title=None, axis=alt.Axis(labels=False)),
color=alt.Color(
"count()",
scale=alt.Scale(scheme="greens"),
legend=alt.Legend(title="No. accidents", orient="top"),
),
opacity=alt.condition(
selection_week,
alt.value(1),
alt.value(0.2),
),
tooltip=[
alt.Tooltip("fulldate:N", title="Date"),
alt.Tooltip("count()", title="No. accidents"),
],
)
.properties(width=int(w), height=int(h / 4)).resolve_scale(y="independent").add_params(selection_week)
)
month_bar = (
alt.Chart(accident_data)
.mark_bar(cornerRadius=10)
.transform_window(
total_acc="count()",
frame=[None, None],
groupby=["monthname"],
)
.transform_calculate(
mean_accidents=alt.datum.total_acc / alt.datum.num_days_in_month,
)
.encode(
y=alt.Y(
"monthname:N",
sort=month_order,
title=None,
axis=alt.Axis(labels=False, ticks=False),
),
color=alt.Color(
"mean_accidents:Q", scale=alt.Scale(scheme="greens")
),
opacity=alt.condition(selection_month, alt.value(1), alt.value(0.2)),
tooltip=[
alt.Tooltip("monthname:N", title="Month"),
alt.Tooltip("count()", title="No. accidents"),
alt.Tooltip("mean_accidents:Q", format=",.2f", title="Mean accidents"),
],
)
.properties(width=int(h / 4), height=int(h))
.add_params(selection_month)
)
text = month_bar.mark_text(
align="center",
baseline="middle",
fontSize=15,
dx=0,
fontWeight="bold",
).encode(
text=alt.Text("monthname:N"),
color=alt.value("black"),
)
month_bar+text | calendars
Layout refinment¶
After having implemented the different views, many changes were made to the layout of the visualization as many of the charts had changed and further views had been added. Furthermore, with the acquired knowledge of the different views, we were able to better design the layout.
To begin with the new layout includes the new views which were not present in the first prototype. Furthermore, the order and distribution of the views has been modified. As shown in the included image, the new layout makes a better utilitzation of the available screen space, being more compact and allowing for a full view of the visualization without the need to scroll.
Furthermore, the views are distributed in a way which is more coherent, with similar views together and the views which are more related to each other closer together. For example, the vehicle and borough charts are now contiguous and the time series and calendar charts are also together. Furthermore, the map is now in the center of the visualization, as it is the main view of the visualization and the one which is used to select the data.
In the following section the implemented version of this layout is presented and further described.

Final design and inter-view interactions¶
In this section, the final design of the visualization is presented. The design is presented in the same order as in the previous section, with the map being the first view and the calendar chart being the last one. Furthermore, the interactions between the views are also described. The visualization is shown in next cell code. We begin by analyzing the overall design of the visualization and then we analyze each view individually in the text cell located after the visualization.
It is important to note that some small changes were made to the views when integrating them into the final layout. Because of this some of the charts appear different than the previous shown version. Nevertheless, the changes are minor and do not affect the analysis of the views.
The visualization has been obtained through an intensive iterative process in which many preliminary versions of every chart have existed. After having obtained the final version of each chart, the inter-view interactions were implemented. The interactions are described in the following sections.
When it comes to the design of the visualization, ignoring the inter-view interactions, the visualization is composed of many views. Overall all of them are coordinated (linked views) view. Except for a few exception (such as the global count) all the views show the same elements (rows of the dataset) at all times while each of them encodes a different variable. When one selection is made through an interaction all the other views are updated, so that all views show the same accidents.
All the views utilize encodings which take into account human perception and use its basic principles, such containment, color hue, spatial position, etc. Furthermore, the views are designed to be as simple as possible, avoiding unnecessary elements and focusing on the data. The views are also designed to be as compact as possible, so that the user can see all the views at the same time without the need to scroll. This makes it easier for the user to compare the different views and to understand the visualization as a whole.
When it comes to the colors used, color has only been used when necessary. A qualitative scheme has been used to encode the burouhs in both the points and the bar chart, so that the user can easily relate both views. On the other side, three heatmaps encode amounts using single-hue color scales. The three charts use different color schemes which can be clearly distinguished. This has been done because in each chart the range of values is different and therefore the color scales are different. Furthermore, the color scales have been chosen so that the colors are easily distinguishable and the user can easily see the differences between the values.
In all the views, when possible, the user can hover over the elements to obtain more information about them. This is done through the use of tooltips and facilitates comparing when the relative difference between two encoded quantites is small and it is difficult to distinguish them.
Furthermore, when possible interaction is achieved by being able to click on marks. When the user clicks on a mark they filter the data which is displayed on all the views.
Finally it is also necessary to mention that the layout has been designed in order to minimize the number of perceived axis. With this we refer to the fact that when possible the different views are aligned between them so that the different axises are aligned. We felt that this reduces the visual load and makes it more aesthetically pleasing.
from graphs import *
data = get_accident_data(fname='dataset_v1.csv')
accident_data = get_weather_data(data)
chart=make_visualization(accident_data,use_interval=False)
chart
c:\Users\pamar\anaconda3\envs\VI\Lib\site-packages\IPython\core\interactiveshell.py:3526: FutureWarning: The `op` parameter is deprecated and will be removed in a future release. Please use the `predicate` parameter instead. exec(code_obj, self.user_global_ns, self.user_ns)
All the views allow for interactivity and are linked to each other. The views are the following:
- KPI (counts) chart: this chart serves as an overview of the data. It displays at all times the total amount of accidents in the dataset and the amount currenlty selected through the interactions. Furthermore it also distinguishes between the accidents with and without injuries, allowing to select either of them. The view encodes the count using the text channel and situates it into a colored rectangle because of aesthetic and clarity reasons.
- Map: Shows the location of the accidents and the number of accidents per borough. It allows the user to select a borough and a group of points, which will highlight them and update the other views so that they only show the data corresponding to the selection. The location of the accidents is encoded using the spatial channel and the borough in which they happened is encoded using the color channel. When a borough is clicked, the opacity of the other boroughs is reduced in order to highlight the selected borough. A similar effect is achieved when a group of points is selected.
- borough bar chart: show the amount of accidents per borough. It encodes both the amount and the borough using the spatial channel. When the user clicks on a bar, that bar is highlighted and the other views are updated to only show the data corresponding to the selected borough. It uses the same color scheme as the dots in the map, so that the user can easily relate the two views.
- Vehicle type chart: shows the amount of accidents per vehicle type. It encodes both the amount and the vehicle type using the spatial channel. When the user clicks on a bar, that bar is highlighted and the other views are updated to only show the data corresponding to the selected vehicle type. It uses icons to encode the vehicle type, so that the user can easily relate the icons with the vehicle types.
- Weather condition chart: shows the amount of accidents per weather condition using a 1-d heatmap. It encodes the condition using the spatial channel and the amount using the color channel (a single hue palette). When the user clicks on a bar, that bar is highlighted and the other views are updated to only show the data corresponding to the selected weather condition.
- Calendar chart: shows the amount of accidents per day of the week and week of the year. It encodes the day of the week and week of the year using the spatial channel and the amount using the color channel (a single hue palette). When the user clicks on a day, the week in which that day is located is highlighted and the other views are updated to only show the data corresponding to the selected week. When in the time of day chart a specific day of the week is selected, the corresponding day in the calendar chart is highlighted. Juxtaposed at the right of the calendar chart and sharing the
ymonth axis, a heatmap like bar chart shows the average amount of accidents per month. When the user clicks on a month, the corresponding month is highlighted and the other views are updated to only show the data corresponding to the selected month. - Time of day chart: shows the amount of accidents per hour of the day and day of the week. It is composed of three juxtaposed (and coordinated with linked highlighting) views; one heatmap and two bar charts. Each of the heatmap's axis is shared with one of the bar charts. The heatmap encodes the day of the week and hour of the day using the spatial channel and the amount using the color channel (a single hue palette). When the user clicks on a specific rectangle of the heatmap, the corresponding bars in both bar charts are highlighted. The top bar chart encodes the amount of accidents per hour. When the user clicks on a bar, that bar and the corresponding rectangles in the heatmap are highlighted. Similarly the right-side bar chart encodes the amount of accidents per day of the week. When the user clicks on a bar, that bar and the corresponding rectangles in the heatmap are highlighted. It is worth noting that the x-axis is repeated in both the heat map and top bar chart to facilitate selecting a specific hour in the top bar chart.
- Contributing factors chart: shows the top 10 contributing factors for the accidents in a bar chart.
How to solve tasks¶
Which weather condition and type of vehicle were present in the majority of accidents each month? And in the combination of all the months?¶
At the bottom-left corner of the visualization, click on the button of the month you wish to select. If you want to see the combination of all months, you can skip this step, or double-click to reset the selection.
Look for the weather conditions chart at the right-hand side of the screen. The weather condition present in the most accidents is shown as the darkest purple color.
Look for the type of vehicle barchart in the center of the view. The most common type of vehicle in the majority of accidents corresponds to the longest bar.
In which area and at what hour did the majority of accidents each month happen? And in the combination of all the months?¶
At the bottom-left corner of the visualization, click on the button of the month you wish to select. If you want to see the combination of all months, you can skip this step, or double-click to reset the selection.
To see in which area the majority of accidents happened look at top-left map to see the concentration of accidents, or at the barchart to the right of it to see the number of accidents by Borough.
To see in what hour the majority of accidents happened look at the teal blue heatmap at the lower side of the screen, and search for the vertical bar with the highest length.
Which area presented the majority of taxi accidents during rainy days in June on Mondays at noon, 12am?¶
At the center of the view, click on the bar corresponding to Taxi vehicles.
At the right-hand side of the screen, click on the button corresponding to rainy days. (If you want to include any weather condition with rain, use
shift-clickto select multiple conditions).At the bottom-left corner of the visualization, click on the June button to select said month.
At the lower side of the screen, either click on the Monday at 12h square on the heatmap, or click on the Monday right-hand side bar and on the 12 hour upper bar.
To see in which area the majority of accidents happened look at top-left map to see the concentration of accidents, or at the barchart to the right of it to see the number of accidents by Borough.
Which day had more accidents during clear days in July in Manhattan?¶
At the right-hand side of the screen, click on the button corresponding to clear weather conditions.
At the bottom-left corner of the visualization, click on the July button to select said month.
At the top side of the screen, either click on the geographical area corresponding to Manhattan, or click on the Manhattan bar on the barchart on the right side of the map.
To see which day had more accidents look at the calendar under the map, and hover with your mouse the day with the darkest green color. Look at the information shown under your cursor to see both the number of accidents and the date of the day.